Service alarm correlation

ABSTRACT

A system and method for correlating alarms from a plurality of network elements (NEs) are provided to unambiguously associate separate alarms to one another. This is accomplished by a method where a fault identifier FID is generated by a serving NE who discovered the faulty hardware or software unit. The serving NE signals its lost or degraded service to a client NE in a traffic message and appends the generated FID to the traffic message. The client NE extracts the FID from the traffic message and appends it to a service alarm which the NE sends to a network management system. The serving NE also generates an alarm message and provides it with same FID. The serving NE sends the alarm message and its FID to the network management system. The service alarm and the alarm message received by the network management system will thus contain the same FID. In the management system the FID is used to correlate the two alarms with one another.

This application is the U.S. national phase of international applicationPCT/SE2004/001769 filed 29 Nov. 2004, which designated the U.S., theentire content of which is hereby incorporated by reference.

TECHNICAL FIELD OF THE INVENTION

The present invention relates to a method, system and network elementsfor processing alarm information within a telecommunication networkmanaged from a network management system.

DESCRIPTION OF RELATED ART

The objective for an operator supervising the telecommunication networkvia a network management system is to be able to restore quicklydegraded or lost services by locating and correcting a faulty unitcausing the degraded or lost service. When alerted of a degraded or lostservice, the operator needs hence to associate or correlate the lostservice with/to the responsible faulty unit.

Network Elements (NEs) in a telecommunication network have differenttasks which together aim to connect two or several user equipments (UEs)together. The NEs may depend on each other in such way that if one NEfails then another NE will fail to provide its services as consequence(a client-server relationship between NEs). An NE comprises hardware(HW) units and software. Software is stored in a memory and runs undercontrol of a processor and operation system. HW units may furtherinclude specific HW units providing the functionality supported by theNE. Within an NE a number of functions execute. These functions may actas serving functions to client functions in other NEs and if thefunction is faulty in serving NE then the client NE will have itsservice degraded or lost as consequence. For example, a faulty board ina radio base station (RBS) may show up in a client Radio NetworkController (RNC) as a message “cell disabled” indicating that theoperational state of the cell with the RBS containing the faulty boardis out of order.

Due to commercial reasons operators tend to mix NEs from differentvendors in their telecommunication networks. To limit dependencies inimplementation the information to be shared between NEs is limited. In aradio network typically the information shared to set up radio functionsis standardised, but not information about HW equipment.

The disadvantage of not being able to inform a client NE about thefaulty HW in the serving NE is that each NE will send an alarm to thenetwork management system, but the alarms are not correlated, i.e. haveno unique association to one another. The alarms upon reception in thenetwork management system are time stamped and stored, for example in adatabase 10. When displaying the alarms in an alarm list the two alarmswill be separated by other alarms, which have been received from same orother NEs during same time period. The operator has then a difficulttask to conclude that a service alarm from client NE is the consequenceof a faulty HW in serving NE. Time and competence to locate the faultand thereby restore service increases.

SUMMARY OF THE INVENTION

One object of the invention is to provide a solution to the problem ofcorrelating alarms, triggered by network elements that have a dependencyof one another, such that the alarms are unambiguously associated withone another.

Various aspects of the invention are based on a fault identifier (FID)mechanism, which provides a unique association between the lost serviceand the responsible faulty unit.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. is a block diagram of a managed network for a telecommunicationsystem, and

FIG. 2. is a schematic view of a list of collected alarms in the networkmanagement system.

DETAILED DESCRIPTION

FIG. 1 shows a network management system (NMS) 1, a client networkelement (NE) 2 and a serving NE 3. In the serving NE there is a faultyhardware (HW) or software unit 4 and fault identity (FID) generator 5.In the client NE there is a FID extractor 6 and faulty service detectionmeans 7. Between the serving NE and the NMS as well as between theclient NE and the NMS there are management interfaces 8 and 9respectively. Between the serving and client NEs there is a trafficinterface 10.

The serving NE has the task to set up and maintain a certain set ofservices, which the client NE is in control over. Typically in acellular radio network the serving NE is a radio base station (RBS)supplying, for example, user equipments (UEs) in a range of a radio cellwith user data such as speech or images.

The client NE in such a cellular radio network is typically a radionetwork controller (RNC), having the task to control one or severalconnected RBSes acting as serving NEs. The RNC controls the RBSes to setup and maintain UEs with requested services, for example speechconnections and controls that UEs can roam between cells served byseveral RBSes.

An NE emits an alarm when a faulty hardware unit is detected and makesthe alarm available over the management interface. The alarm generatingmechanism in the NE appends following information in the alarm:

-   Who: The name of the device or NE experiencing the fault-   What: The condition of the fault, i.e. the symptom of the fault-   When: The time the problem was detected

In addition to the NEs above, a telecommunications network has one ormore NMSes, which among other things, are used to supervise the NEs. Forsupervision the NMS has a mechanism to receive and store alarms, forexample in a database 11 and present the alarms to an operator of anoperator console 12. The NMS communicates with each respective NE viathe management interfaces. Management messages are exchanged between theNEs and the NMS over the management interfaces. The traffic interfacebetween client NE and serving NE provides traffic messages fornegotiating the services requested by the client NE. Specifically in aradio network the interface provides traffic messages to add and deletecells served by the RBS to support a connection between the UE and theRNC. The information exposed over the traffic interface is howeverlimited. This is due to the fact that operators tend to mix NEs fromdifferent vendors in their telecommunication networks. Standardizationbodies have agreed upon the use of general andimplementation-independent service primitives in the traffic interfacein order to limit dependencies between vendors' implementation. Forexample it is not possible to send information on the identity of afailing HW or software unit over the traffic interface, the onlyinformation exposed over the interface is that the serving NE has afailure and cannot deliver the service requested by the client NE. Thedisadvantage of not being able to inform a client NE about the faulty HWin the serving NE is that both NEs will send alarms to the networkmanagement system, but the alarms are not correlated, i.e. they have nounique association to one another. The operator has then a difficulttask to conclude that a service alarm from the client NE is theconsequence of a faulty HW in serving NE. Time and competence to locatethe fault and thereby restore service increases.

An overview of the alarm processing method pursuant to an embodiment ofthe present invention is provided in FIG. 1. When the serving NE detectsthe faulty hardware unit it sends an alarm, said alarm referred to as ahardware alarm 13, to the NMS. The non-shown device in the serving NEwhich discovered the faulty hardware unit retrieves a unique faultidentifier (FID) from the FID generator 5. The unique FID is generatedby combining a NEs network unique name with an integer. The integer isderived from a 19 bit variable and is stepped with 1 for every newfault. An example for a FID: RBS1.sub.-262143, where RBS1 is the networkunique name of the network element and 262143 is the decimal equivalentto the 19 bit variable.

The FID is appended to the hardware alarm and forwarded to the NMS overthe management interface 8. The hardware alarm is stored in the database11 and is presented to the operator in an alarm list 16 to be describedbelow in connection with FIG. 2. Upon detection of the faulty HW orsoftware unit the serving NE also sends a traffic message 15 to theclient NE and appends the same FID to it. The traffic message informsthe client NE that the requested service is no longer available. Theclient NE receives the traffic message with the appended FID andgenerates in response thereto an alarm indicative of the lost service,the alarm referred to as a service alarm 14. In particular the client NEextracts the FID from the traffic message and appends it to the servicealarm 14. The service alarm with the appended FID is forwarded to theNMS over the management interface 9. The NMS upon reception stores andpresents the service alarm with the appended FID in the alarm list.

FIG. 2 shows an alarm list 16 with a number of alarms received from manydifferent serving and client NEs in the cellular radio system. There isone alarm listed on each row and the list is possible to scroll. Thelist is made from the stored hardware and service alarms in thedatabase. The alarms contain the identity of the NE experiencing thefault, such as RNC or RBS and also the reporting faulty unit or service.The detection time may also be part of each alarm. The alarms are listedin the chronological order they where generated and time-stamped in theNEs. As shown there are many other alarms which have been generated insame or other NEs supervised by same NMS during the time span from thereception of the hardware alarm 13 and the reception of the servicealarm 14. As shown it is easy to unambiguously associate the hardwareand service alarms to each other by the FIDs 17 which both emanate fromone and the same detected failure.

While the system and method shown and described is the preferred, it isapparent that the FID can be generated in other ways than from combininga NEs network unique name with an integer. A unique FID may be obtainedby assigning each NE a number series and by assigning different numberseries to individual NEs. The FID may thus be generated within each NEas a randomly selected number within the assigned number series.

1. A system for correlating alarms from a plurality of network elementsNEs in a telecommunications network, the system comprising: a client NEand a serving NE which depend on each other in such way that if theserving NE fails, then the client NE will fail to provide its servicesas a consequence, the serving NE being configured to signal trafficmessages to the client NE without providing information on a faultyhardware or software unit in the traffic messages; and a networkmanagement system structured to supervise the client and serving NEs andto receive and store alarms from the client and serving NEs, wherein theserving NE comprises: means for generating a fault identifier (FID)related to the faulty hardware or software unit; means for forwarding analarm message to the network management system and including therein theFID; and means for providing the traffic message with the same FID tothe client NE, wherein the client NE comprises: means for extracting theFID from the traffic message from the serving NE and appending theextracted FID to a service alarm message expressing a service fault; andmeans for forwarding the service alarm message to the network managementsystem, and the network management system receives separately the alarmmessage from the serving NE and the service alarm message from theclient NE.
 2. The system in accordance with claim 1, wherein the meansfor generating the FID generates a randomly selected number for eachfault.
 3. The system in accordance with claim 1, wherein the means forgenerating the FID combines a name of the serving NE with an integer,wherein the integer is stepped for every new detected fault.
 4. Thesystem in accordance with claim 1, information exposed in an interfacebetween the serving NE and the client NE comprises information to set upand maintain traffic connections between the client and serving NEs, butnot information to identify the faulty hardware or software units. 5.The system in accordance with claim 1, wherein the system is a radionetwork, the serving NE is a radio base station, and the client NE is aradio network controller.
 6. A serving network element (NE), comprising:a memory unit structured to store therein program software for a serviceand for operating the service under a control of a processor; hardware(HW) for the service and for operating the service; an interface towardsa client NE to send a traffic message, the interface having serviceprimitives for signaling availability of requested services in thetraffic message but having no service primitives for signaling of afaulty hardware or software unit of the serving NE in the trafficmessage; an alarm interface towards a network management system to sendan alarm message; fault detection means for detecting the faultyhardware or software unit in the serving NE; and means for generatingthe alarm message in response to a detection of the faulty hardware orsoftware unit by the fault detection means, the alarm message beingforwarded to the network management system wherein the means forgenerating the alarm message comprises: a device for generating uniquefault identifiers (FIDs); and a device for appending the generatedunique FID to the alarm message sent to the network management systemover the alarm interface, and for appending the same unique FID to a thetraffic message sent to the client NE over the interface towards theclient NE, and wherein the alarm message to the network managementsystem and the traffic message to the client NE are separately sent. 7.A client network element (NE), comprising: a memory unit structured tostore therein program software for a service and for operating theservice under a control of a processor; hardware (HW) for the serviceand for operating the service; an interface towards a serving NE toreceive a traffic message, the interface having service primitives forsignaling availability of requested services but having no serviceprimitives for signaling of faulty hardware or software unit of theserving NE in the traffic message; an alarm interface towards a networkmanagement system to send a service alarm message; fault detection meansfor extracting service primitives in the traffic message received fromthe serving NE; and means for generating the service alarm in responsethe fault detection means extracting service primitives indicative of aninability of the serving NE to provide the requested service, theservice alarm being forwarded to the network management system, whereinthe fault detection means is structured to extract a fault identifier(FID) appended to the service primitives indicative of the inability ofthe serving NE to provide the requested service, wherein the client NEfurther comprises a device for appending the extracted FID to theservice alarm generated in response to the fault detection meansextracting the service primitives indicative of the inability of theserving NE to provide the requested service, and wherein the alarmmessage sent to the network management system correlates to an alarmmessage separately sent by the serving NE to the network managementsystem related to the service primitives received from the serving NE.8. A method for correlating alarms from a plurality of network elements(NEs) in a telecommunications network including a client NE and aserving NE, that depend on each other in such way that if the serving NEfails, then the client NE will fail to provide its services as aconsequence, the method comprising: the serving NE discovering a faultyhardware or software unit and in response thereto forwarding an alarmmessage indicative of the faulty hardware or software unit to a networkmanagement system; the serving NE forwarding to the client NE a trafficmessage indicative of an inability of the serving NE to provide arequested service, wherein information on the faulty hardware orsoftware unit is not included in the traffic message; the client NEreceiving the traffic message and forwarding in response thereto aservice alarm message indicative of a lost service to the networkmanagement system; and the network management system storing the alarmmessage from the serving NE and the service alarm message from theclient NE and presenting them to an operator, wherein the step of theserving NE forwarding the alarm message to the network management systemand the step of forwarding the traffic message to the client NEcomprise: the serving NE, upon detection of a the faulty hardware orsoftware unit, generating a fault identifier (FID) and associating theFID with the detected faulty unit; and the serving NE appending the FIDto the traffic message which it transmits to the client NE and to thealarm message which it transmits to the management system, wherein thestep of the client NE forwarding the service alarm message to thenetwork management system comprises the client NE appending the FID tothe service alarm message which it transmits to the network managementsystem, and wherein the step of the network management system storingthe alarm message from the serving NE and the service alarm message fromthe client NE and presenting them to the operator comprises the networkmanagement system, upon reception of the service alarm message and thealarm message, associating the two alarm messages, which are separatelyreceived, to one another using the FID.
 9. The method in accordance withclaim 8, wherein the FID is a randomly selected number unique for theserving NE.
 10. The method in accordance with claim 8, wherein the FIDis a combination of a name of the serving NE and an integer, wherein theinteger is stepped for every new detected fault.
 11. A serving networkelement in a wireless network, the serving network element comprising: aserving unit structured to provide service for a client network element(serving service); a fault detection unit structured to detect when afault associated with the serving unit and to generate a faultidentifier (FID) associated with the serving unit; a managementinterface unit structured to communicate with a network managementsystem, the management interface unit sending a fault alarm messageindicating the fault associated with the serving unit to the networkmanagement system, the fault alarm message having the FID includedtherein; and a traffic interface unit structured to communicate with aclient network element, the traffic interface unit separately sending atraffic message having the FID included therein to the client networkelement, the traffic message comprising one or moreimplementation-independent primitives indicating that the servingnetwork element is unable to fulfill the serving service, theimplementation-independent primitives being such that information on theserving unit providing the serving service is not included in thetraffic message.
 12. The serving network element of claim 11, whereinthe fault detection unit generates the FID such that FIDs generated dueto first and second occurrences of a same fault are different.
 13. Theserving network element of claim 11, wherein the fault detection unitgenerates the FID such that each FID generated is unique from all otherFIDs generated within the wireless network.
 14. A client network elementin a wireless network, the client network element comprising: a trafficinterface unit structured to communicate with a serving network element,the traffic interface unit receiving a traffic message which comprisesone or more implementation-independent primitives indicating that theserving network element is unable to fulfill a serving service, theimplementation-independent primitives being such that information on aserving unit of the serving network element providing said servingservice is not included in the traffic message; a fault detection unitstructured to detect when the serving network element is unable toprovide the serving service and to extract a fault identifier (FID) fromthe traffic message received from the serving network element via thetraffic interface unit; and a management interface unit structured tocommunicate with a network management system, the management interfaceunit sending a service alarm message indicating that the client networkelement is unable to provide a service associated with the clientnetwork element (client service) to the network management system, theservice alarm message having the FID included therein, and the servicealarm message being related to a fault alarm message sent separately bythe serving network element to the network management system andindicating a fault associated with the serving unit of the servingnetwork element.
 15. A network management system in a wireless network,the network management system comprising: a management interface unitstructured to communicate with a client network element and a servingnetwork element, the client network element receiving a serving servicefrom the serving network element, and the client and serving networkelements communicating with each other via implementation-independentservice primitives in which information of a serving unit of the servingnetwork element providing the serving service is not provided by theserving network element to the client network element; a storage unitstructured to store therein a fault alarm message received from theserving network element indicating a fault with the serving unit of theserving network element, and to separately store therein a separatelyreceived service alarm message from the client network elementindicating inability of the client network element to provide a clientservice, both the fault alarm message and the service alarm messageincluding therein fault identifiers (FID); and an association unitstructured to associate the fault alarm message and the service alarmmessage to each other based on the FIDs.